学习来自数据样本的给定策略的价值函数是强化学习中的重要问题。TD($ \ lambda $)是一个流行的算法,可以解决这个问题。但是,分配给不同$ n $ -step的权重在参数$ \ lambda $控制的TD($ \ lambda $)中返回,随着$ n $的增加,呈指数级增长。在本文中,我们展示了一个$ \ lambda $ -schedule程序,将TD($ \ lambda $)算法概括为参数$ \ lambda $的情况随时间步骤而异。这允许通过选择序列$ \ {\ lambda_t \} $ \ {t \ geq 1} $来指定重量分配中的灵活性,即,用户可以指定分配给不同$ n $ -step返回的权重。基于此过程,我们提出了一个on-police算法 - TD($ \ lambda $) - 计划和两个offoly almorithms - gtd($ \ lambda $) - 计划和tdc($ \ lambda $) - 计划,分别。我们提供了一般马尔可夫噪声框架下所有三种算法的几乎肯定融合的证据。
translated by 谷歌翻译
梯度时间差(梯度TD)算法是用于钢筋学习中的政策评估的流行随机近似(SA)算法。在这里,我们考虑具有额外的重球动量项的梯度TD算法,并提供阶梯尺寸和动量参数的选择,确保这些算法的几乎肯定偶然的趋势。在这样做时,我们将沉重的球梯度TD分解为三个单独的迭代,具有不同的步骤尺寸。我们首先使用当前文献的结果进行一次时间尺度SA设置分析这些迭代。但是,一时间时间形案例是限制性的,并且可以通过查看迭代的三次时间尺度分解来提供更一般的分析。在此过程中,我们提供了一般三次Timescale SA的稳定性和融合的第一个条件。然后,我们证明了沉重的球梯度TD算法使用我们的三个时间尺度SA分析来收敛。最后,我们在标准RL问题上评估了这些算法,并报告了Vanilla算法的性能提高。
translated by 谷歌翻译
随机重球(SHB)和Nesterov的加速随机梯度(ASG)是随机优化的流行动量方法。尽管对确定性环境中这种加速思想的好处有充分的理解,但它们在随机优化方面的优势仍然尚不清楚。实际上,在某些特定情况下,众所周知,在样本复杂性意义上,动量无济于事。我们的工作表明,类似的结果实际上是整个二次优化的。具体而言,我们为该家族获得了SHB和ASG样品复杂性的下限,并表明Vanilla SGD可以实现相同的结合。我们注意到,存在二次优化中基于动量方法的优势的结果,但这些方法基于单方面或有缺陷的分析。
translated by 谷歌翻译
Embedding words in vector space is a fundamental first step in state-of-the-art natural language processing (NLP). Typical NLP solutions employ pre-defined vector representations to improve generalization by co-locating similar words in vector space. For instance, Word2Vec is a self-supervised predictive model that captures the context of words using a neural network. Similarly, GLoVe is a popular unsupervised model incorporating corpus-wide word co-occurrence statistics. Such word embedding has significantly boosted important NLP tasks, including sentiment analysis, document classification, and machine translation. However, the embeddings are dense floating-point vectors, making them expensive to compute and difficult to interpret. In this paper, we instead propose to represent the semantics of words with a few defining words that are related using propositional logic. To produce such logical embeddings, we introduce a Tsetlin Machine-based autoencoder that learns logical clauses self-supervised. The clauses consist of contextual words like "black," "cup," and "hot" to define other words like "coffee," thus being human-understandable. We evaluate our embedding approach on several intrinsic and extrinsic benchmarks, outperforming GLoVe on six classification tasks. Furthermore, we investigate the interpretability of our embedding using the logical representations acquired during training. We also visualize word clusters in vector space, demonstrating how our logical embedding co-locate similar words.
translated by 谷歌翻译
Agile robotics presents a difficult challenge with robots moving at high speeds requiring precise and low-latency sensing and control. Creating agile motion that accomplishes the task at hand while being safe to execute is a key requirement for agile robots to gain human trust. This requires designing new approaches that are flexible and maintain knowledge over world constraints. In this paper, we consider the problem of building a flexible and adaptive controller for a challenging agile mobile manipulation task of hitting ground strokes on a wheelchair tennis robot. We propose and evaluate an extension to work done on learning striking behaviors using a probabilistic movement primitive (ProMP) framework by (1) demonstrating the safe execution of learned primitives on an agile mobile manipulator setup, and (2) proposing an online primitive refinement procedure that utilizes evaluative feedback from humans on the executed trajectories.
translated by 谷歌翻译
When testing conditions differ from those represented in training data, so-called out-of-distribution (OOD) inputs can mar the reliability of black-box learned components in the modern robot autonomy stack. Therefore, coping with OOD data is an important challenge on the path towards trustworthy learning-enabled open-world autonomy. In this paper, we aim to demystify the topic of OOD data and its associated challenges in the context of data-driven robotic systems, drawing connections to emerging paradigms in the ML community that study the effect of OOD data on learned models in isolation. We argue that as roboticists, we should reason about the overall system-level competence of a robot as it performs tasks in OOD conditions. We highlight key research questions around this system-level view of OOD problems to guide future research toward safe and reliable learning-enabled autonomy.
translated by 谷歌翻译
Efforts to improve the adversarial robustness of convolutional neural networks have primarily focused on developing more effective adversarial training methods. In contrast, little attention was devoted to analyzing the role of architectural elements (such as topology, depth, and width) on adversarial robustness. This paper seeks to bridge this gap and present a holistic study on the impact of architectural design on adversarial robustness. We focus on residual networks and consider architecture design at the block level, i.e., topology, kernel size, activation, and normalization, as well as at the network scaling level, i.e., depth and width of each block in the network. In both cases, we first derive insights through systematic ablative experiments. Then we design a robust residual block, dubbed RobustResBlock, and a compound scaling rule, dubbed RobustScaling, to distribute depth and width at the desired FLOP count. Finally, we combine RobustResBlock and RobustScaling and present a portfolio of adversarially robust residual networks, RobustResNets, spanning a broad spectrum of model capacities. Experimental validation across multiple datasets and adversarial attacks demonstrate that RobustResNets consistently outperform both the standard WRNs and other existing robust architectures, achieving state-of-the-art AutoAttack robust accuracy of 61.1% without additional data and 63.7% with 500K external data while being $2\times$ more compact in terms of parameters. Code is available at \url{ https://github.com/zhichao-lu/robust-residual-network}
translated by 谷歌翻译
Despite recent progress towards scaling up multimodal vision-language models, these models are still known to struggle on compositional generalization benchmarks such as Winoground. We find that a critical component lacking from current vision-language models is relation-level alignment: the ability to match directional semantic relations in text (e.g., "mug in grass") with spatial relationships in the image (e.g., the position of the mug relative to the grass). To tackle this problem, we show that relation alignment can be enforced by encouraging the directed language attention from 'mug' to 'grass' (capturing the semantic relation 'in') to match the directed visual attention from the mug to the grass. Tokens and their corresponding objects are softly identified using the cross-modal attention. We prove that this notion of soft relation alignment is equivalent to enforcing congruence between vision and language attention matrices under a 'change of basis' provided by the cross-modal attention matrix. Intuitively, our approach projects visual attention into the language attention space to calculate its divergence from the actual language attention, and vice versa. We apply our Cross-modal Attention Congruence Regularization (CACR) loss to UNITER and improve on the state-of-the-art approach to Winoground.
translated by 谷歌翻译
Despite the recent progress in language generation models, their outputs may not always meet user expectations. In this work, we study whether informational feedback in natural language can be leveraged to improve generation quality and user preference alignment. To this end, we consider factual consistency in summarization, the quality that the summary should only contain information supported by the input documents, for user preference alignment. We collect a high-quality dataset, DeFacto, containing human demonstrations and informational feedback in natural language consisting of corrective instructions, edited summaries, and explanations with respect to the factual consistency of the summary. Using our dataset, we study two natural language generation tasks: 1) editing a summary using the human feedback, and 2) generating human feedback from the original summary. Using the two tasks, we further evaluate if models can automatically correct factual inconsistencies in generated summaries. We show that the human-edited summaries we collected are more factually consistent, and pre-trained language models can leverage our dataset to improve the factual consistency of original system-generated summaries in our proposed generation tasks. We make the DeFacto dataset publicly available at https://github.com/microsoft/DeFacto.
translated by 谷歌翻译
Heart failure remains a major public health challenge with growing costs. Ejection fraction (EF) is a key metric for the diagnosis and management of heart failure however estimation of EF using echocardiography remains expensive for the healthcare system and subject to intra/inter operator variability. While chest x-rays (CXR) are quick, inexpensive, and require less expertise, they do not provide sufficient information to the human eye to estimate EF. This work explores the efficacy of computer vision techniques to predict reduced EF solely from CXRs. We studied a dataset of 3488 CXRs from the MIMIC CXR-jpg (MCR) dataset. Our work establishes benchmarks using multiple state-of-the-art convolutional neural network architectures. The subsequent analysis shows increasing model sizes from 8M to 23M parameters improved classification performance without overfitting the dataset. We further show how data augmentation techniques such as CXR rotation and random cropping further improves model performance another ~5%. Finally, we conduct an error analysis using saliency maps and Grad-CAMs to better understand the failure modes of convolutional models on this task.
translated by 谷歌翻译